Comparation between novel online models and the AJCC 8th TNM staging system in predicting cancer-specific and overall survival of small cell lung cancer

Background Most of previous studies on predictive models for patients with small cell lung cancer (SCLC) were single institutional studies or showed relatively low Harrell concordance index (C-index) values. To build an optimal nomogram, we collected clinicopathological characteristics of SCLC patients from Surveillance, Epidemiology, and End Results (SEER) database. Methods 24,055 samples with SCLC from 2010 to 2016 in the SEER database were analyzed. The samples were grouped into derivation cohort (n=20,075) and external validation cohort (n=3,980) based on America’s different geographic regions. Cox regression analyses were used to construct nomograms predicting cancer-specific survival (CSS) and overall survival (OS) using derivation cohort. The nomograms were internally validated by bootstrapping technique and externally validated by calibration plots. C-index was computed to compare the accuracy and discrimination power of our nomograms with the 8th of version AJCC TNM staging system and nomograms built in previous studies. Decision curve analysis (DCA) was applied to explore whether the nomograms had better clinical efficiency than the 8th version of AJCC TNM staging system. Results Age, sex, race, marital status, primary site, differentiation, T classification, N classification, M classification, surgical type, lymph node ratio, radiotherapy, and chemotherapy were chosen as predictors of CSS and OS for SCLC by stepwise multivariable regression and were put into the nomograms. Internal and external validations confirmed the nomograms were accurate in prediction. C-indexes of the nomograms were relatively satisfactory in derivation cohort (CSS: 0.761, OS: 0.761) and external validation cohort (CSS: 0.764, OS: 0.764). The accuracy of the nomograms was superior to that of nomograms built in previous studies. DCA showed the nomograms conferred better clinical efficiency than 8th version of TNM staging system. Conclusions We developed practical nomograms for CSS (https://guowei2020.shinyapps.io/DynNom-CSS-SCLC/) and OS (https://drboidedwater.shinyapps.io/DynNom-OS-SCLC/) prediction of SCLC patients which may facilitate clinicians in individualized therapeutics.

Background: Most of previous studies on predictive models for patients with small cell lung cancer (SCLC) were single institutional studies or showed relatively low Harrell concordance index (C-index) values. To build an optimal nomogram, we collected clinicopathological characteristics of SCLC patients from Surveillance, Epidemiology, and End Results (SEER) database.
Methods: 24,055 samples with SCLC from 2010 to 2016 in the SEER database were analyzed. The samples were grouped into derivation cohort (n=20,075) and external validation cohort (n=3,980) based on America's different geographic regions. Cox regression analyses were used to construct nomograms predicting cancer-specific survival (CSS) and overall survival (OS) using derivation cohort. The nomograms were internally validated by bootstrapping technique and externally validated by calibration plots. C-index was computed to compare the accuracy and discrimination power of our nomograms with the 8th of version AJCC TNM staging system and nomograms built in previous studies. Decision curve analysis (DCA) was applied to explore whether the nomograms had better clinical efficiency than the 8th version of AJCC TNM staging system.
Results: Age, sex, race, marital status, primary site, differentiation, T classification, N classification, M classification, surgical type, lymph node ratio, radiotherapy, and chemotherapy were chosen as predictors of CSS and OS for SCLC by stepwise multivariable regression and were put into the nomograms. Internal and external validations confirmed the nomograms were accurate in prediction. Cindexes of the nomograms were relatively satisfactory in derivation cohort (CSS: 0.761, OS: 0.761) and external validation cohort (CSS: 0.764, OS: 0.764). The accuracy of the nomograms was superior to that of nomograms built in previous studies. DCA showed the nomograms conferred better clinical efficiency than 8th version of TNM staging system.

Introduction
Lung cancer is a chief cause of death due to malignancy globally. Small cell lung cancer (SCLC), originated from neuroendocrine cells, is an aggressive cancer which accounts for approximately 15% of all lung cancers, causing 30,000 deaths annually (1). Unlike nonsmall cell lung cancer which showed excellent response to target therapy or immunotherapy (2)(3)(4), the recent clinical trials showed new drugs could only brought limited benefit for SCLC (5,6). SCLC is characterized with high malignant level, high doubling rate, and early and extensive metastasis (7). The 5-year survival probability for patients with SCLC receiving no active treatment is as poor as less than 5% with an average overall survival (OS) time of merely 2-4 months (8, 9). National Comprehensive Cancer Network Clinical Practice Guidelines in Oncology (version 2.2018) suggest stage I SCLC patients receive surgery and adjuvant chemotherapy (10). However, more than 80% SCLC patients are identified at stage III/ IV, causing a high mortality of SCLC (11). Most early-stage SCLC patients can benefit from platinum-based chemotherapy and radiation, while advanced or metastatic stage patients receive platinum-based chemotherapy alone (12). However, nearly all SCLC will recur ultimately because of early dissemination and acquired drug resistance.
Because of the heterogenous nature of SCLC, it needs to be dealt with as an individual entity. Considering that, the latest 8th version of American Joint Committee on Cancer (AJCC) tumor, lymph node, metastasis (TNM) staging system can predict the prognosis of SCLC more precisely than the 7th version (13). However, unlike the TNM staging systems for gastric and rectal cancer which take both anatomical regions and positive lymph nodes count into consideration, the TNM staging system for SCLC only include the anatomical regions of lymphatic metastasis (14, 15). Lymph node ratio (LNR) which is calculated by LNR = number of positive lymph nodes/number of examined lymph nodes, taking number of positive and examined lymph nodes into account, can solve this problem (16). What's more, the TNM staging system doesn't include the demographic data (age, sex, race, and marital status), histopathologic features (laterality, site, differentiation, and histology), and treatment modalities (surgery, radiotherapy, and chemotherapy) which may also be predictors for SCLC patients. Then it's clear that the TNM staging system isn't sufficient enough for the long-term survival prediction for SCLC patients. Therefore, an optimal model with better predictive performance is needed, and a nomogram is a satisfying tool to settle all these problems.
Nomogram is a visualized model to predict survival probability using multivariable Cox or other regression analyses of potential prediction variables. Most of previous studies on nomograms for SCLC patients were single center studies or showed relatively low Harrell concordance index (C-index) values which were the indicator for discrimination power and accuracy of prediction (17)(18)(19)(20). In order to construct and validate a superior prognostic nomogram to help clinicians to choose treatment strategies, we performed this study. TRIPOD reporting checklist was used to guide the reporting of this research.

Data origin
The Surveillance, Epidemiology, and End Results (SEER) database contains patient information of 18 malignancy registries of the National Cancer Institute. As a national data bank, SEER covers information of approximately 30% of the US population (21). Clinicopathological information of SCLC patients was extracted from the SEER database (version 8.3.9; https:// seer.cancer.gov/resources/). The 3rd edition of the International Classification of Diseases for Oncology was used to determine the primary site and histological type of the malignancy. The requirement for informed consent by patients and ethical approval by institutional review board were waived as all patient information in SEER database was deidentified before publication and included no information which could identify the patients. We performed this study in line with the Harmonized Tripartite Guideline for Good Clinical Practice from the International Conference on Harmonization and the Declaration of Helsinki (as revised in 2013).

Patient screening
Altogether, 24,055 samples from SEER database were singled out for further analysis. Patients meeting following conditions were included: (I) primary lung cancer patients from 2010 to 2016 with primary site coding of C34.0-C34.9; (II) diagnosed with the histological type of small cell (ICD-O-3 codes: 8002, 8041-8045) with pathologic verification; (III) recognized as only one primary tumor. Samples with following criteria were excluded (I) were younger than 18 years; (II) had only autopsy or death certificate for diagnosis; (III) had missing information about race, marital status at diagnosis as well as laterality; (IV) had missing information concerning TNM staging system, examined lymph nodes number, positive lymph nodes number, surgical type, radiotherapy, chemotherapy as well as overall survival. The cancer stage of the study cohort was updated on the basis of the AJCC 8th TNM staging system (22).
Purchased/referred care delivery areas (PRCDA) was used to identify the geographic position of patients. Patients from East and Pacific Coast region of America were put into derivation cohort, while patients from PRCDA of Alaska, Northern Plains, and Southwest region were defined as external validation cohort. The nomogram was internally validated with bootstrapping technique in derivation cohort and externally validated by calibration plots in external validation cohort.

Research variables and outcomes
Demographic data of the samples concerning age at diagnosis, sex, race, PRCDA region, and marital status at diagnosis was obtained. What's more, histopathologic characteristics of cancer involving primary site, laterality, differentiation, histological type, T classification, N classification, M classification and LNR were extracted. The therapeutic regimens concerning surgical type, radiotherapy, and chemotherapy were also obtained. Continuous variables like age and LNR were changed into categorical variables. Age was categorized into <50, 50-59, 60-69, 70-79, and ≥80. Besides, for patients who underwent lymph node examination, LNR was dichotomized via the X-tile software (version 3.6.1; https:// medicine.yale.edu/lab/rimm/research/software/) according to the cutoff value which could present the largest OS difference between two groups (23,24).
In this study, we chose cancer-specific survival (CSS) and OS as the endpoints. CSS means the interval between diagnosis and SCLC-specific death, and OS means the interval between diagnosis and all-cause death with the unit of month. Follow-up data and survival outcome information from the SEER database updates every year and the latest ending date of follow-up information was December 31, 2016.

Construction and evaluation of predictive model
To make the Cox model more accurate, continuous variables were changed into categorical variables and presented as count (percentage). Baseline features of samples in derivation and external validation cohort were put into comparation using standardized difference. Survival outcomes were presented with the Kaplan-Meier curves, and a two-sided log-rank test was used to detect the survival difference.
Moreover, univariable Cox analysis was applied to screen variables which could potentially predict the CSS or OS. Variables with P value for hazard ratio (HR) <0.1 selected by univariable Cox analysis were analyzed with multivariable Cox regression using stepwise Akaike Information Criterion (stepAIC) method to select the optimal predictors for the final models (25). Then, the HR with 95% confidence interval (CI) was reported. The ability of prediction of the model was evaluated according to discrimination, accuracy, and clinical efficiency. C-indexes were calculated to test the discrimination power of prediction, calibration and receiver operating characteristic (ROC) curves were plotted to test the accuracy, and decision curve analysis (DCA) was performed for assessment of the clinical efficiency (26-28).

Establishment and validation of the nomogram
Variables screened out by multivariable Cox regression analysis using StepAIC method were put into the nomogram for CSS or OS of SCLC patients. Bootstrap technique was used for internal validation of the model with 1000 resamples of the derivation cohort. Calibration plots of 1-, 3-, and 5-year OS were utilized to compare the nomogram-predicted OS rate with the actual OS rate. The accuracy was considered to be high when the predictions fell closely to the diagonal line of the calibration plot.
We performed the statistical analyses by R software (version 4.1.0; http://www.r-project.org) and. The statistical tests were twosided and statistical significance was achieved with P value smaller than 0.05.

Baseline characteristics
From January 2010 to December 2016, 42,031 SCLC patients were reported in SEER database. After applying the screening criteria, 24,055 samples were retained as the study cohort. All of these had information relating to OS, and 23,883 of these had information relating to CSS. 20,075 patients (16,655 died before the last follow-up) from East and Pacific Coast region were put into the derivation cohort, and 3,980 patients (3,299 died before the last follow-up) from Alaska, Northern Plains, and Southwest region were defined as the external validation cohort. Process of sample screening was demonstrated in Table 1. Baseline characteristics of all samples grouped by derivation and external validation cohort were presented in Table 2. The whole study cohort's median (interquartile range [IQR]) age at diagnosis was 67 years (60-74 years). The majority of the samples were diagnosed at the age of ≥50 years (96.2%), were white people (86.3%), were diagnosed at T3-4 classification (65.9%), N2-3 classification (77.8%), M1 classification (67.8%) and received chemotherapy (72.1%). Few patients (2.5%) received surgery, and nearly half patients (49.5%) received radiotherapy.

Establishment and validation of nomogram
Variables screened by multivariable Cox analysis using StepAIC method were incorporated into nomograms (Figures 3, 4). As shown in the nomograms, chemotherapy and surgery exert the greatest impact on the nomograms, followed by M classification, differentiation, age at diagnosis, and other variables. Each factor of the variables in the nomograms could be translated into to a score on the point scale. And the total points computed by adding the corresponding points of variables and the estimated CSS or OS could be acquired through plotting a straight line down from the total points.
The C-indexes of our nomograms were both 0.761 (95% CI: 0.757-0.765) for CSS and OS. The internal validation by bootstrapping technique with 1000 replicated sampling of the derivation cohort suggested the adjusted C-index for the nomograms were both 0.760, suggesting a sound predictive ability. As for the external validation, the C-indexes of the nomograms in the external validation cohort were both 0.764 (95% CI: 0.755-0.774) for CSS and OS which were even better than those of the derivation cohort. In addition, the calibration curves exhibited the predictions fell close to the diagonal line, demonstrating an ideal conformity between predicted and actual CSS or OS rates ( Figure 5). Based on all of the above, the prediction of the nomogram is convincingly accurate.

Comparation on predictive power of the 8th version of AJCC TNM and Nomogram
The C-indexes of the 8th AJCC TNM staging system in derivation cohort [0.622 (95% CI: 0.616-0.627) for CSS and 0.614 (95% CI: 0.609-0.619) for OS] and external validation cohort [0.617 (95% CI: 0.605-0.628) for CSS and 0.613 (95% CI: 0.602-0.625) for OS] were inferior to those of the nomograms. In addition, the areas under ROC curve of the nomograms were higher than the 8th AJCC TNM staging system for predicting 1-, 3-, and 5-year CSS or OS in both derivation and external validation cohorts, indicating a better prediction accuracy ( Figure 6). And DCA plots for CSS or OS demonstrated that the nomogram had a good performance of clinical efficiency, and patients might benefit more from the nomogram than the TNM staging system for both the derivation and validation cohorts (Figure 7).

Construction of a webserver of the nomogram
Online dynamic nomograms based on our study were constructed for CSS (https://guowei2020.shinyapps.io/DynNom-

Discussion
In this multicenter study, SCLC patients' data was extracted from the SEER database according to the screening criteria, and univariable and multivariable Cox regression analyses were  radiotherapy, and chemotherapy. Older age means more degenerative changes in organs function and prevalence of more comorbidities which cause worse outcome (29). Male patients showed worse survival than female ones, which could be seen in three other studies (16,19,20). White patients exhibited worse survival outcome compared with Black, which was similar with results of two previous studies (20,30). As for marital status, our study revealed that married SCLC patients had better survival than other marital statuses, which wasn't found in other studies on SCLC. While, studies on non-small cell lung cancer suggested   prognosis prediction. The nodal staging of 8th AJCC TNM staging system rests on the concept that lymphatic metastasis starts in the nodes nearest to the primary malignancy and metastasize to nodes far from the tumor afterwards (34). N classification defines N1 as metastasis to ipsilateral peribronchial and/or hilar nodes and intrapulmonary nodes, N2 as metastasis to ipsilateral mediastinal and/or subcarinal nodes, and N3 as metastasis to contralateral mediastinum, contralateral hilar, ipsilateral or contralateral anterior scalene and supraclavicular nodes, without taking number of examined or positive lymph nodes into account (35).
In order to solve this problem, we added the dichotomized LNR value into the models, and the optimal cut-off value was 0.6 calculated by the X-tile software, while in Wang's study, the cut-off value was 0.01 (16). This may because Wang's study cohort was with early stage and resected SCLC, and our study cohort included unresectable SCLC patients who underwent lymph node biopsy. In fact, our nomogram is fit for all T1-4N0-3M0-1 SCLC patients. We searched previous studies on nomogram for SCLC on PubMed. Xie's group introduced two nomograms predicting OS for SCLC patients incorporating pretreatment peripheral blood markers including ratios of inflammatory cell counts and red cell distribution width. Xie's nomograms were based on a single institutional study with 938 patients and had lower C-index than ours (0.73 vs. 0.761), what's more, Xie's group built two nomograms for patients with extensive stage and limited stage SCLC while our one nomogram for OS can be used in all SCLC patients (18). Pan's Nomogram for 1-, 3-and 5-year CSS prediction of SCLC patients. CSS, cancer-specific survival; SCLC, small cell lung cancer; LN, lymph node; LNR, lymph node ratio; NOS, not otherwise specified. and Xiao's nomograms for OS of SCLC were also developed from single center with lower C-indexes than ours (0.68 vs. 0.761, 0.60 vs. 0.761) (17, 19). The recent study of nomogram for SCLC conducted by Wang used clinical data of 24,680 patients from National Cancer Database (20). However, comparing to Wang's nomogram, ours presented a better discrimination power (C-index: 0.761 vs. 0.722), maybe because of deficiency of information of site, differentiation, and detailed T, N, and M classification data of Wang's nomogram.
As for the treatment modality, the surgery group exhibited survival benefit, and patients underwent lobectomy benefited from surgery most. However, only 2.6% of the overall study cohort received surgery. The poor effectiveness of therapeutic strategies for SCLC progression nowadays is connected with the lack of early diagnosis, and these patients usually have no chance of receiving surgery, because distant metastasis or paraneoplastic syndrome detracts the therapeutic potential of surgery (36). Patients with early-stage SCLC after surgery had improved survival was manifested by some previous studies (10,12,30). First-line standard chemotherapy for SCLC is combining etoposide or irinotecan with platinum. Concurrent or sequential radiotherapy is needed for limited stage disease, while chemotherapy serves as the mainstream strategy in the first-line setting (37). As we can see in Figures 3,4, chemotherapy made the largest contribution to our nomogram, indicating the great importance of chemotherapy for FIGURE 4 Nomogram for 1-, 3-and 5-year OS prediction of SCLC patients. OS, overall survival; SCLC, small cell lung cancer; LN, lymph node; LNR, lymph node ratio; NOS, not otherwise specified.
SCLC. Moreover, our study revealed most SCLC patients (72.1%) received chemotherapy and nearly half (49.5%) received radiotherapy, which is consistent with Wang's large cohort study (20). Targeted therapy and immunotherapy have also been deeply researched and showed some encouraging results in recent years, however, the treatment modalities information screened from SEER database didn't include targeted therapy and immunotherapy (5,6,37,38). ROC curves of the 8th version of TNM staging system and nomogram for predicting 1-, 3-, and 5-year CSS in the derivation cohort (A-C) and validation cohort (D-F) and 1-, 3-, and 5-year OS in the derivation cohort (G-I) and validation cohort (J-L). ROC, receiver operating characteristic; AJCC, American Joint Committee on Cancer; TNM, tumor, node, metastasis; CSS, cancer-specific survival; OS, overall survival.
As far as we know, our study, as a multicenter research with a large study cohort, introduces nomograms with the highest C-index indicating highest discrimination power and accuracy for prediction of prognosis for all-stage SCLC. Several limitations have to be admitted in this research. First, it's a retrospective study, making it susceptible to the inherent weakness of retrospective data collection. Second, SEER database is short of some variables potentially influencing CSS or OS, such as smoking history, laboratory test results such as neutrophil or lymphocyte count, platelet count, red cell distribution width, and tumor markers associated with SCLC, etc. Third, detailed treatment modalities can't be found in SEER database such as sequence between chemotherapy and surgery/radiotherapy, specific radiotherapy, chemotherapy, target therapy or immunotherapy regimens, thus making the risk scores of different therapeutic strategies can't be presently applied as a guideline for regimen choice, because clinical therapeutic modalities need to be chosen according to all the covariables which influence the survival outcome.

Conclusions
We constructed and validated nomograms for CSS and OS of SCLC, which demonstrated superior prediction performance to AJCC 8th TNM staging system or nomograms built in previous studies. Webservers was built based on the nomogram which may help clinicians in decision-making.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions
Conception/design: ML, PZ, and YG. Collection and/or assembly of data: ML, PZ, SW, and WG. Data analysis and interpretation: PZ and SW. Manuscript writing: SW, WG, and YG. Funding support: YG. All authors contributed to the article and approved the submitted version.

Funding
This study was funded by Naval Medical University research project (2021QN15).